Enriching the sequence substitution matrix by structural information.

نویسندگان

  • Octavian Teodorescu
  • Tamara Galor
  • Jaroslaw Pillardy
  • Ron Elber
چکیده

A fundamental step in homology modeling is the comparison of two protein sequences: a probe sequence with an unknown structure and function and a template sequence for which the structure and function are known. The detection of protein similarities relies on a substitution matrix that scores the proximity of the aligned amino acids. Sequence-to-sequence alignments use symmetric substitution matrices, whereas the threading protocols use asymmetric matrices, testing the fitness of the probe sequence into the structure of the template protein. We propose a linear combination of threading and sequence-alignment scoring function, to produce a single (mixed) scoring table. By fitting a single parameter (which is the relative contribution of the BLOSUM 50 matrix and the threading energy table of THOM2) we obtain a significant increase in prediction capacity in the twilight zone of homology modeling (detecting sequences with <25% sequence identity and with very similar structures). For a difficult test of 176 homologous pairs, with no signal of sequence similarity, the mixed model makes it possible to detect between 40 and 100% more protein pairs than the number of pairs that are detected by pure threading. Surprisingly, the linear combination of the two models is performing better than threading and than sequence alignment when the percentage of sequence identity is low. We finally suggest that further enrichment of substitution matrices, combing more structural descriptors such as exposed surface area, or secondary structure is expected to enhance the signal as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A 3 D - 1 D Substitution Matrix for Protein

In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...

متن کامل

A 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.

In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...

متن کامل

A Protein Structural Alphabet and Its Substitution Matrix CLESUM

By using a mixture model for the density distribution of the three pseudobond angles formed by Cα atoms of four consecutive residues, the local structural states are discretized as 17 conformational letters of a protein structural alphabet. This coarse-graining procedure converts a 3D structure to a 1D code sequence. A substitution matrix between these letters is constructed based on the struct...

متن کامل

Hydropathy Conformational Letter and its Substitution Matrix HP-CLESUM: an Application to Protein Structural Alignment

Motivation: Protein sequence world is discrete as 20 amino acids (AA) while its structure world is continuous, though can be discretized into structural alphabets (SA). In order to reveal the relationship between sequence and structure, it is interesting to consider both AA and SA in a joint space. However, such space has too many parameters, so the reduction of AA is necessary to bring down th...

متن کامل

Structural alignment from sequence alone by integration of predicted secondary structure

Protein sequence alignment is at the core of a variety of fundamental tasks such as homology modeling, fold recognition, evolutionary studies and more. Much attention was dedicated over recent years for alignment of sequences in the twilight zone, where pairwise sequence alignment methods (e.g. BLAST) perform poorly. State of the art methods employ sequence profiles, and sometimes meta-sequence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Proteins

دوره 54 1  شماره 

صفحات  -

تاریخ انتشار 2004